AMENDMENTS TO THE SPECIFICATION 



Please replace paragraph [0011] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0011] [6] M. A. Fischler and R. C. BoUes. Ransac random sample concensus: a 
paradigm for model ttingfitting with applications to image analysis and automated 
cartography. In Communications of the ACM, volume 26, 1981. — 

Please replace paragraph [0014] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

- [0014] [9] E. Grimson, P, Viola, O.Faugeras, T. Lozano-Perez, T. Poggio, and S. 
Teller. A forest of sensor e s sensors. In Intemational Conference on Computer Vision, 
pages 45-51, 1997. 

Please replace paragraph [0032] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0032] [27] M, Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu, Mosaic Efficient 
Representations of Video Sequences and Their Applications. Signal Processing: Image 
Communication, special issue on Image and Video Semantics: Processing, Analysis, and 
Application, Vol. 8, No. 4, May 1996. - 

Please replace paragraph [0058] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0058] In the direct method, if there are no temporal changes in the scene, trer^then the 
temporal derivatives within the sequence are zero: St=0. Therefore, for any space-time 
point (x, y, t), the error term of Eq. (1) presented below reduces to:— 
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Please replace paragraph [0063] of the pubUshed patent appUcation (US 
2002/0094135) with the following rewritten paragraph: 

—[0063] The sequence-to-sequence paradigm is not limited only to direct 
methods, but can equally be implemented using feature-based methods. Feature-based 
methods first apply a local operator to detect singularity points on an image (e.g., Harris 
comer detector) [11]. Once two sets of singularity points are extracted, robust estimation 
methods such as RANSAC[6], LMS[7], etc. are used for finding corresponding points, 
and extracting the alignment parameters. — 

Please replace paragraph [0064] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0064] To address sequences instead of images, we extend the mining of a feature from 
feature point into feature trajectory. That is a trajectory of points representing its location 
on each frame within each sequence. Thus the second step will find correspondences 
between trajectories of points (the features x,y coordinates along the sequence). Note that 
in sequence-to-sequence alignment these trajectories do not necessarily belong to a 
moving object, but may include prominent features which belongs to a static object. This 
^ will produce a constant trajectory that is valid in any sense. - 

Please replace paragraph [0065] of the pubHshed patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0065] Feature based sequence-to-sequence alignment is a generalization of feature- 
based image-to-image alignment. If we consider a scene without moving objects, all 
trajectories will become trajectories of static objects, and the input is similar, thus the 
latter becomes identical to the first. — 
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Please replace paragraph [0070] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0070] The paradigm of sequence-to-sequence alignment extends beyond any particular 
method. It can equally apply to feature-based matching across sequences, or other types 
of match measures (e.g., mutual information). — 

Please replace paragraph [0082] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0082] FIG. 7C is a pictorial illustrations of some possible alignments between the 
frames of FIGS. 7A and 7B;- 

Please replace paragraph [00103] of the pubhshed patent application (US 
2002/0094135) with the following rewritten paragraph: 

-[0103] FIG. 6 is a diagram of a preferred method for subsampling and aligning image 
sequences according to a preferred embodiment of the present invention, where Sq is an 
original image sequence, S'l is subsampled byfrom So as described herein, S*2 is 
subsampled from S'l similarly, and so on. — 

Please replace paragraph [0104] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0104] FIG. 6 illustrates a preferred hierarchical spatio-temporal ali^iment framework. 
A volumetric pyramid is constructed for each input sequence, one for the reference 
sequence (on the right side), and one for the second sequence (on the left side). The 
spatio-temporal alignment estimator is applied iteratively at each level. It refines the 
approximation based on the residual misalignment between the reference volume and a 
warped version of the second volume (drawn as a skewed cube). The output of the 
current level is propagated to the next level to be used as an initial estimate. — 
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Please replace paragraph [0106] of the pubUshed patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0106] FIGS. 8A-8D illustrate spatio-temporal ambiguity in alignment when using only 
temporal information. A small airplane is crossing a scene viewed by two cameras. The 
airplane trajectory does not suffice to uniquely determine the alignment parameters. 
Arbitrary time shifts can be compensated by appropriate spatial translation along the 
airplane motion direction. Sequence-to-sequence alignment, on the other hand, can 
uniquely resolves this ambiguity, as it uses both the scene dynamics (the plane at 
different locations), and the scene appearance (the static ground). Note that spatial 
information alone does not suffice in this case either. — 

Please replace paragraph [0108] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0108] FIG. 1 1 illustrates a scene with moving objects. Lines 1 1(a) and 1 1(b) display 4 
representative frames (100,200,300,400) from the reference and second sequences, 
respectively. The spatial misalignment is easily noticeable near image boundaries, where 
different static objects are visible in each sequence. The temporal misalignment is 
noticeable by comparing the position of the gate in frames 400. In the second sequence it 
is already open, while still closed in the reference sequence. Line +ll(c) displays 
superposition of the representative frames before spatio-temporal alignment. The 
superposition composes the red and blue bands from reference sequence with the green 
band from the second sequence. Line 11(d) displays superposition of corresponding 
frames after spatio-temporal alignment. The dark pink boundaries in (d) correspond to 
scene regions observed only by the reference camera. The dark green boundaries in (d) 
correspond to scene regions observed only by the second camera. — 

Please replace paragraph [0110] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0110] FIG. 13 illustrates a scene with non-rigid motion. Lines 13(a) and 13(b) display 
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four representative frames (0,100,200,300) from the reference and second sequences, 
respectively. Line 13(c) displays superposition of the representative frames before spatio- 
temporal alignment. The spatial misalignment between the sequences is primarily due to 
scale differences in cameras focal length (i.e., differences in scale). The temporal 
misalignment is most evident in frames 300 of line 13(a) vs. 300 of line 13(b), where the 
wind blows the flag in reversed directions. Line 13(d) displays superposition of 
corresponding frames after spatio-temporal alignment. — 

Please replace paragraph [0111] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0111] FIGS. 14-16 illustrate a scene which constantly changes its appearance. FIGS. 
14 and 15 display 10 frames (20,30, . . . ,110) from the reference and second sequences, 
respectively. It is difficult to tell the connection between the two sequences. The event in 
frames 90-1 10 in the reference sequence (Fig. 14), is the same as the event in frames 20- 
40 in the second sequence (Fig. 15). FIG. 16 A displays superposition of the 
representative frames before spatio-temporal alignment. FIG. 16B displays superposition 
of corresponding frames after ratiespatio-temporal alignment. Due to the scale difference 
there is an overlap between the two sequences only in the upper right region of every 
frame. Fireworks in the non-overlapping regions appear dark pink, as they were observed 
only by one camera. Fireworks in the overlapping regions appear white, as they should. 
The recovered temporal misalignment was approximately 66 frames. — 

Please replace paragraph [0114] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

"[0114] For example, as shown in FIG. 1, a plurality of sequences of images are 
received, such as three sequences 50, 90 and 120 in the illustrated embodiment, captured 
by different image capturing devices typically imaging the same scene 180. In the 
illustrated embodiment, the image capturing devices are cameras I, II and III also 
designated by reference numerals 20, 30 and 40 respectively. Each sequence, as shown. 
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comprises an ordered multiplicity of images. For example, the sequence imaged by 
camera 20 is shown, for simplicity, to include three images 60, 70 and 80. — 

Please replace paragraph [0115] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0115] A particular advantage of a preferred embodiment of the invention as shown and 
described herein is that, as illustrated in FIG. 1, individual imaging processes, each of 
which have limitations, can, due to those limitations, represent an event so imperfectly as 
to be genuinely misleading. For example, as shown in FIG. 1, due to the insufficient 
temporal sampling employed by each of the three imaging devices 20, 30 and 40, none of 
the devices succeeds in correctly representing the S-shaped trajectory actually followed 
by the ball in the true scene 4*0160. The first camera 20, as shown in FIGS. 2A-2B, 
perceives a straight-line trajectory because it images the ball only at positions 1, 6 and 1 1 
which happen to fall roughly along a straight line. The second camera 30, as shown in 
FIGS. 3A-3B, perceives a banana-shaped trajectory because it images the balls only at 
positions 2, 5 and 8 which happen to fall roughly along a banana-shaped curve. The third 
camera 40, as shown in FIGS. 4A-4B, perceives an inverted banana-shaped trajectory 
because it images the balls only at positions 3, 7 and 11 which happen to fall roughly 
along an inverted banana-shaped curve. — 

Please replace paragraph [0118] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0118] In FIG. 5, the sequences are pictorially shown to be spatially aligned as 
evidenced by the three different orientations of the ft*ames of type I, originating from 
camera I in FIG. 1, the frames of type II, originating from camera II, and the frames of 
type III, originating from camera III. As shown, the frames of type I are skewed such that 
their upper right hand comers ha&-have been pivoted upward, the frames of type III are 
skewed such that their upper left hand comers has-have been pivoted downward, and the 
frames of type II are skewed intermediately between the frames of types I and III. The 
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particular spatial misalignment illustrated pictorially in FIG. 1 is merely illustrative and is 
not intended to be limiting- Computational methods for effecting the alignment shown 
pictorially in FIG. 5 are described in detail herein, — 

Please replace paragraph [0136] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

"[0136] A different Hnearization (with respect to (x,y,t)) is possible as well: 
e(x, y, t; P}) = S'(x, y, t) - S(x, y, t) + [uvwv]Vs(x, y, t) (55)— 

Please replace paragraph [0138] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

[0138] denotes a spatio-temporal gradient of the sequence S. Eq. (55) directly 
relates the unknown displacements (u, v, w) to measurable brightness variations within 
the sequence. To allow for large spatio-temporal displacements (u, v/ w), the 
minimization of Equations (1), ei^(3) or (5) is done within an iterative-warp coarse-to- 
fine framework as described herein. 

Please replace paragraph [0142] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

-- P-[hn hi2^^i3l^2i hi did J 

[0142] P = [hij h,2 dj dj] i.e., eight unknowns. The individual voxel 

error of Eq. (3-5) becomes: — 

Please replace paragraph [0144] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0144] Model 2: 2D spatial projective transformation and a temporal offset. In 
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this case, w(t)=d (d is a real number, i.e., could be a sub-frame shift), 
andP-[h„,h,2 h,3 h^j hj, hjj h33d] 
P = [h,i,h,2 h,3 hj, h23 1132 h33 d]. 



Please replace paragraph [0145] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0145] Each spatio-temporal "voxel" (x, y, t) provides one constraint: 



e(x,y,t;P)=S'-S + 



H,P 



— X 



(d,t + d,) 



Please replace paragraph [0146] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0146] The 2D projective transformation is not linear in the unknown 
parameters, and hence preferably undergoes some additional manipulation. To overcome 

this non-linearity, Eq. (46) is multiplied by the denominator {h^p), and renormalized 

with its current estimate from the last iteration, leading to a slightly different error term: - 

Cnew ( X, y, t; P ) = ^ . e„„ (x, y, t; P) , (#7)— 



Please replace paragraph [0147] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0147] where is the current estimate of in the iterative process, and e^^^ is as 
defined in Eq. (46). — 

Please replace paragraph [0148] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 
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"[0148] Let H and d be the current estimates of H and d, respectively. Substituting 

H = H SH and d = d Sd into Eq. (57), and neglecting high-order terms, leads to a 
new error term, which is Hnear in all unknown parameters ( SH and Sd), \n addition to 

second order terms (e.g, SH Sd ), the first order term dSH^ is also negligible and can be 
ignored.— 

Please replace paragraph [0157] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0157] (a) Warp S\ using the current parameter estimate: iS'/:=warp(S*i;P). — 

Please replace paragraph [0162] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

— [0162] In our experiments, two different interlaced CCD cameras (mounted on tripods) 
were used for sequence acquisition. No synchronization what so ever was used. Typical 
sequence length is several hundreds of frames. Lines (a)-(d) in FIG. 1 1 shows a scene 
with a car driving in a parking lot. The two input sequences line 1 1(a) and line 1 1(b) were 
taken from two different windows of a tall building. Line 1 1(c) displays superposition of 
representative frames, generated by mixing the red and blue bands from the reference 
sequence with the green band from the second sequence. This demonstrates the initial 
misalignment between the two sequences, both in time and in space. Note the different 
timing of the gate being lifted (temporal misalignment), and misalignment of static scene 
parts, such as the parked car or the bushes (spatial misalignment). Line 1 1(d) shows the 
superposition after applying spatio-temporal alignment. The second sequence was spatio- 
temporally warped towards the reference sequence according to the computed 
parameters. The recovered spatial affine transformation indicated a translation on the 
order of a 1/5 of the image size, a small rotation, a small scaling, and a small skew (due 
to different aspect ratios of the two cameras). The recovered temporal shift was 46.63 
frames. Therefore, opposite fields at a distance of 46 frames were mixed together when 
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applying the color superposition.— 

Please replace paragraph [0163] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0163] In FIG. 12, the sequences (a)-(d) illustrate that dynamic information cues are not 
restricted to independent object motion. A Hght source was brightened and then dimmed 
down, resulting in observable illumination variations in the scene. The cameras were 
imaging a picture on a wall from significantly different viewing angles, inducing a 
significant perspective distortion. Line (a) and line (b) show a few representative frames 
from two sequences of several hundred frames each. The effects of illumination are 
particularly evident in the upper left comer of the image. Note the difference in 
illumination in frame 200 of the two sequences—frame 200 in line 12(a) and frame 200 in 
line 12(b). Line 12(c) shows a superposition of the representative frames from both 
sequences before spatio-temporal alignment. Line 12(d) shows superposition of 
corresponding frames after spatio-temporal alignment. The correctness of the temporal 
alignment is evident from the hue in the upper left comer of frame 200, which is pink 
before alignment (frame 200 in line 12(c)) and white after temporal alignment (frame 200 
in line 12(d)). The accuracy of the recovered temporal offset (21.32 frames) was verified 
(up to 0.1 frame time) against the ground truth. The verification was implemented by 
imaging a small object (a tennis ball) that moves very fast. The objects waswere viewed 
by three fields only (not included in the part that was used tefor alignment). The tennis 
ball location enables us to verify manually that correct field-to field temporal 
corresponds. Furthermore, the phase differences of these locations (3 in each sequences) 
produce sub field accuracy "ground truth". We manually distinguish between 5 phases 

Please replace paragraph [0165] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0165] In FIG. 13 the sequences (a)-(d) illustrate a case where the dynamic changes 
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within the sequence are due to non-rigid motion (a flag blowing in the wind). Line 13(a) 
and line 13(b) show two representative frames out of several hundred. Line 13(c) shows a 
superposition of the representative frames from both sequences before spatio-temporal 
alignment. Line 13(d) shows superposition of corresponding frames after spatio-temporal 
aUgnment. The recovered temporal e— setoffset was 31.43 frames. Image-to-image 
alignment performs poorly in this case, even when applied to temporally corresponding 
frames, as there is not enough spatial information in many of the individual frames. This 
is shown in FIG. 10. We applied image-to-image alignment to all temporally 
corresponding pairs of fields, (odd fields from one camera with even fields from the 
second camera as the computed time shift (31,4) is closer to 31.5 than the 3L0) Only 
55% of corresponding frames converged to accurate spatial alignment. The other 45% 
suffered from noticeable spatial misalignment. A few representative frames (out of the 
45% of misaligned pairs) are shown in FIG. 10, line (a). These pairs were well aligned by 
sequence-to-sequence alignment (FIG. 10, line (b)),~ 

Please replace paragraph [0166] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

--[0166] FIGS. 14-16 illustrates that temporal changes may include changes in 
appearance of the entire scene. The sequences show an explosion of fireworks. The 
fireworks change their appearance (size, shape, color and brightness) drastically 
throughout the sequence. FIGS. 14 and 15 show ten representative frames from two 
sequences of a few hundreds frames each. Frames 20-110 are displayed from the both 
sequences. The event in frames 90-1 10 in the reference sequence (Fig, 14), is the same as 
the event shown in frames 20-40 in the second sequence (Fig. 15). Line 16(a) displays 
superposition of four representative frames (80-110) before applying spatio-temporal 
alignment. The fireworks appear green and pink, due to the superposition of the different 
bands from different sequences (red and blue from one sequence and green from the 
other). The artificial colors are due to the mixture of misaligned fireworks with dark 
background from the other sequence. Line 16(b) displays superposition of the same five 
representative frames after applying spatio-temporal alignment. The fireworks are now 
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white in the overlapping image regions, as they should be, implying good spatio-temporal 
alignment. — 

Please replace paragraph [0167] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 

—[0167] The above results were mainly qualitative. To quantify the expected accuracy of 
the method we applied several experiments were-where the exact ground truth alignment 
was known. First we warped a sequence using a-known spatio-temporal parameters, 
applied our method to the warped and original sequence and compared the extracted 
parameters with the known ones. This produced highly accurate results. Less than 0.01 
frame time temporal error, and less than 0.02 pixels spatial error. The accurate results are 
due to the fact that the source and warped sequences are highly correlated. The only 
difference in corresponding "voxels" gray level is as a results of the tri-linear 
interpolation used when of the warping the second sequence. To create a test were-where 
the noise is less correlated we split a sequence into its two fields. The two field" 
sequences are related by known temporal and spatial parameters a temporal shift of 0.5 
frame time, and temporal shift of 0.5 pixel along the Y axis. Note? that in this case the 
data comes from the same camera, but from completely different sets of pixels (odd rows 
in one sequence and even rows in the other sequence). We repeated the experiment 
several (10) times using different sequences and different spatial models (affine, 
projective). In all cases the temporal error was smaller thanthen 0.02 frame time, (i.e., the 
recovered time shift was between 0.48 and 0.52). The recoveryed in the Y-translation was 
smaller then 0.03 pixel (i.e., the recovered Y-shift was between 0.47 and 0.53 pixel)T and 
the overall Euclidean error eH-over the image was bounded by 0.1 pixels. To include 
error that results from using two cameras, we applied this test to pairs of sequences 
form from different cameras. — 

Please replace paragraph [0168] of the published patent application (US 
2002/0094135) with the following rewritten paragraph: 
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—[0168] Each sequence was split into two sequences of odd and even fields. In this case 
the ground truth is not given but the relative change is known. That is if the time shift 
between odd sequences from the first camera reference cameras is St then the time shift 
between odd sequences from the first camera and even sequences from the reference 
camera should be 5t +0.5^ and the same holds for spatial alignment. This also was 
performed several times and in all cases the temporal error was bounded by 0.05 frame 
time and the spatial error was bounded by 0.1 pixel. — 
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